Neural Computing and Applications
Grammatical facial expression recognition using customized deep neural network
architecture
--Manuscript Draft--
Manuscript Number:
Full Title: Grammatical facial expression recognition using customized deep neural network
architecture
Article Type: Original Article
Keywords: Computer Vision; Grammatical facial expression recognition; Brazilian sign language
classification; Customized deep neural network
Corresponding Author: Devesh Walawalkar, B.tech
Veermata Jijabai Technological Institute
Mumbai, Maharashtra INDIA
Corresponding Author Secondary
Information:
Corresponding Author's Institution: Veermata Jijabai Technological Institute
Corresponding Author's Secondary
Institution:
First Author: Devesh Walawalkar, B.tech
First Author Secondary Information:
Order of Authors: Devesh Walawalkar, B.tech
Order of Authors Secondary Information:
Funding Information:
Abstract: This paper proposes to expand the visual understanding capacity of computers by
helping it recognize human sign language more efficiently. This is carried out through
recognition of facial expressions, which accompany the hand signs used in this
language. This paper specially focuses on the popular Brazilian sign language
(LIBRAS). While classifying different hand signs into their respective word meanings
has already seen much literature dedicated to it, the emotions or intention with which
the words are expressed haven't primarily been taken into consideration. As from our
normal human experience, words expressed with different emotions or mood can have
completely different meanings attached to it. Lending computers the ability of
classifying these facial expressions, can help add another level of deep understanding
of what the deaf person exactly wants to communicate. The proposed idea is
implemented through a deep neural network having a customized architecture. This
helps learning specific patterns in individual expressions much better as compared to a
generic approach. With an overall accuracy of 98.04% , the implemented deep network
performs excellently well and thus is fit to be used in any given practical scenario.
Suggested Reviewers: Tom Mitchell
Professor, Carnegie Mellon University
mitchell@cs.cmu.edu
His work interests are primarily in machine learning and computational neurosciences,
which is core to my paper topic.
Tai-Sing Lee
Professor, Carnegie Mellon University
tai@cs.cmu.edu
His research is primarily based upon learning hierarchical neural codes in vision
system.
Jason Corso
Associate professor, University of Michigan
jjcorso@umich.edu
His interests include computer vision and artificial intelligence
Honglak Lee
Associate professor, University of Michigan
honglak@umich.edu
His interests specially include deep learning and Computer vision
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Scott Fahlman
Emeritus Faculty, Carnegie Mellon University
sef@cs.cmu.edu
His is been working on areas related to AI and development of algorithms for artificial
neural network. He would be an ideal reviewer for my paper.
Jia Deng
Assistant professor, University of Michigan
jiadeng@umich.edu
Interests include computer vision and Machine learning
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Neural Computing and Applications manuscript No.
(will be inserted by the editor)
Grammatical facial expression recognition using customized
deep neural network architecture
Devesh Walawalkar
Received: date / Accepted: date
Abstract This paper proposes to expand the visual
understanding capacity of computers by helping it rec-
ognize human sign language more efficiently. This is car-
ried out through recognition of facial expressions, which
accompany the hand signs used in this language. This
paper specially focuses on the popular Brazilian sign
language (LIBRAS). While classifying different hand
signs into their respective word meanings has already
seen much literature dedicated to it, the emotions or
intention with which the words are expressed haven’t
primarily been taken into consideration. As from our
normal human experience, words expressed with dif-
ferent emotions or mood can have completely different
meanings attached to it. Lending computers the ability
of classifying these facial expressions, can help add an-
other level of deep understanding of what the deaf per-
son exactly wants to communicate. The proposed idea
is implemented through a deep neural network having
a customized architecture. This helps learning specific
patterns in individual expressions much better as com-
pared to a generic approach. With an overall accuracy
of 98.04% , the implemented deep network performs
excellently well and thus is fit to be used in any given
practical scenario.
Keywords Computer Vision · Grammatical facial
expression recognition · Brazilian sign language
classification · Customized deep neural network
Devesh Walawalkar
Bachelor of Technology in Electronics from V.J.T.I.,Mumbai
1005,11th floor,Hrishikesh Apts.,
Veer Savarkar Marg,Dadar(W),Mumbai,India-400028
Tel.: +919820143154
E-mail: devwalkar64@gmail.com
ORCID:0000-0001-9464-9027
1 Introduction
Sign language is an essential medium used by deaf peo-
ple to communicate with other people in their envi-
ronment. As sign language doesnt have a speech com-
ponent through which an average human conveys the
emotion behind what he says or does, the facial expres-
sions assumes this important role in a sign language.
A computer trained only to understand the language
through hand gestures would fail to understand the se-
mantic and structural level context of what the person
tries to convey. A lot of literature on this topic [1,3,4,
8,9,12,16–18] has primarily focused on sign language
recognition through hand gestures only without con-
sidering its facial expressions aspect. Combining classi-
fication of facial expressions along with hand gestures
would result in a more efficient interpretation [14,19].
These facial expressions are called ‘Grammatical Fa-
cial Expressions’ (GFEs) as they help to resolve the
semantic level ambiguity in human sign language. Fa-
cial expression recognition has attracted attention over
recent years, because of the fact that it can be very
useful in many applications such as speech recording
systems which uses sign language to normal language
text conversion , subtitling a video in which sign lan-
guage is conveyed etc. Neural network techniques are
used for this topic as it is very efficient in learning com-
plex functions when given enough training data. Previ-
ous work on GFE classification [2] is based upon tra-
ditional classification methods, thus in turn failing to
leverage potential of recent deep learning developments.
This paper presents comparison of proposed model per-
formances with those stated by Freitas et al. [10], with
both models being computed on the same dataset. Per-
formance comparisons with a generic fully connected
neural network have also been presented.
Manuscript latex file Click here to download Manuscript template.tex
Click here to view linked References
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
2 Devesh Walawalkar
The presented paper structure is as follows: 1] Demon-
stration of the fundamental classes (markers) through
which wide variety of GFEs can be classified. 2] The in-
corporated dataset and its detailed description, 3] Im-
plementation of customized deep neural network archi-
tecture, 4] Network initialization and its hyper parame-
ter tuning, 5] Cost function and Optimization algorithm
used, 6] Binary and Multiclass classification system per-
formance results, 7] Comparison of binary classification
performance with that of an accepted method present
in literature and 8] Final discussion of achieved results
and its implications.
2 Importance of grammatical facial expressions
Sign language consists of mainly two components: man-
ual and non-manual [5]. The manual components con-
sist of hand shape, palm orientation and arm move-
ment. The non-manual components consist of facial ex-
pressions, pose and mouth movement. Some signs can
be distinguished from manual components only, while
rest need the additional non-manual component to dis-
tinguish them. The Brazilian sign language system con-
sists of certain words which have nearly identical hand
sign formation. They differ from each other only in
terms of the facial expression with which they are said.
Hence Sign language recognition only through manual
cues leads to inefficient and ambiguous classification.
Facial expressions play a vital role in effectively com-
municating the information through to the listener. In
a common language such as English, the exclamation
mark, question mark, the comma etc. provides the emo-
tion attached to a said sentence by the source. As reshuf-
fling the comma in different positions in same sentence
can completely change the meaning of it, in the same
way change or absence of GFEs in a sign language can
completely change its meaning.
3 Database Collection
This paper is based upon empirical results computed
on ‘Grammatical Facial Expression Dataset’, created
by Freitas et al. [10] and obtained under public license
from University of California, Irvine Machine learning
repository [15]. This dataset is based upon facial ex-
pression made by a sign language performer (further
mentioned as user) captured through individual video
frames. There are eight fundamental types of grammat-
ical markers in Brazilian sign language [Libras] system
as stated by Brito [6] and de Quadros et al. [7].These
are as follows, along with their meanings:
Wh question: generally used for questions with
Who, What, When, Where, How and Why;
Yes/no question: used when asking a question to
which there is a ‘yes’ or ‘no’ answer;
Doubt question: This is not a ‘true’ question since
an answer is not expected. However, it is used to
emphasize the information that will be supplied;
Topic: used when one of the sentences constituents
is displaced to the beginning of the sentence;
Negative: It is used in negative sentences;
Assertion: used when making assertions;
Conditional: used in subordinate sentence to indi-
cate a prerequisite to the main sentence;
Focus: used to highlight new information into the
speech pattern;
Fig. 1: Attribute point locations on User face
The dataset consists of 225 videos recorded in five
different recording sessions carried out with the user.
In each session, one performance of each sentence was
recorded.User was asked to perform the sentence from
each of the above type (with an additional Relative
marker type, which is used at start of a clause in the
sentence).Examples of these mentioned markers present
individually in sentences in the common English lan-
guage is given as follows [10]:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Grammatical facial expression recognition using customized deep neural network architecture 3
Conditional:
1] If you miss, you lose.
2] If its sunny, I go to the beach.
3] If you dont want, he wants.
4] If it rains, I dont go!
Assertion:
1] I bought that!
2] I work there!
3] I want it!
4] I go!
Negative:
1] I never have been in jail!
2] I didnt do anything!
3] I dont like it!
4] I dont have that!
Relative:
1] That enterprise? ... Its business is technology!
2] The girl who fell from bike? ... She is in the hos-
pital!
3] Celi, the deaf school, is located in anonymous!
4] Waine, who is Lucass friend, is graduated in Ped-
agogy!
Focus:
1] The bike is BROKEN.
2] It was WAYNE who did that!
3] It was Wayne who pay for it!
4] I like BLUE.
Topics:
1] I have a notebook!
2] Fruits ... I like pineapple!
3] I work with technology!
4] Sport ... I like volleyball!
Doubt questions:
1] Did you GRADUATE?
2] Did Waine buy A CAR?
3] Do you GO AWAY?
4] Is this YOURS?
Wh-questions:
1] What is this?
2] Where do you live?
3] How do you do that?
4] When did Waine pay?
Yes/no questions:
1] Did he go away?
2] Is this yours?
3] Did you graduate?
4] Did Waine buy a car?
Multiple frames were captured from each of these
marker videos and predefined attribute face points (Fig-
ure 1) were located in each of these frames. The X, Y
(frontal image plane) and Z (depth) coordinates of each
of these 100 attributes, for each frame were recorded
using a Microsoft Kinect
TM
sensor. These frames were
then hand classified as a binary classification task for
each individual class with help of a sign language ex-
pert. This procedure was implemented for two users (A
and B) so as to reduce any particular user bias present
in acquired data. The dataset contains 27965 frames
in total, classified into 18 different classes (9 for each
user). Description of dataset constituents can be seen
in Table 1.
Table 1: No. of positive and negative samples for each GFE
class
GFE class Positive Negative
type samples samples
Assertion 541 644
Yes/no question 734 841
Negative 568 596
Topic 510 1789
Conditional 448 1486
Doubt question 1100 421
Focus 446 863
Relative 981 1682
Wh question 643 962
The implemented test set comprises of 30% of the
total dataset available. This gives a sample (frame)
count of 400 - 450 samples per class (for binary classi-
fication) for each user.
4 Data pre-processing
For each attribute, its (X, Y) coordinate points are
given in pixels, whereas its Z coordinate is given in mm.
As both units are different and hence their numerical
ranges being different, Z score standardization is per-
formed on dataset before using it in experimentation.
This also makes the learnt model invariant to the lo-
cation of face in captured frame (i.e. having a different
set of attribute numerical values). Some isolated coordi-
nate values missing from the dataset are represented by
a placeholder value of ‘0.0’. Such random values could
lead to wrong model learning. Hence, these values are
replaced by mean of that particular attribute point’s
(either X, Y or Z) remaining sample values present
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
4 Devesh Walawalkar
in dataset. This particular modification was positively
supported by enhanced model performance. Each of the
marker’s binary classification dataset contains an un-
equal number of positive and negative cases, with on
average negative ones being much greater than positive
ones (refer Table 1). This might lead the model to be
slightly biased towards learning the negative pattern.
Hence for training, equal number of both classes are
considered. This does lead to an appreciable increase in
model performance.
5 Deep neural network architecture
For this model, a customized feed-forward deep net-
work architecture was implemented. It consists of two
hidden layers along with the standard input and out-
put layers. The entire customized architecture can be
refer to in Figure 2. Here, for each sample (frame) the
attribute points standardized X, Y, Z coordinates are
fed to a single neuron in the first hidden layer. Thus,
100 neurons present in first hidden layer are tuned to
find learning pattern in each of its respective attribute
points coordinates. The space represented by first layer
can be expressed as,
H1 {V0, V1, V2, ..., Vn, ..., V99} (1)
where Vn {Xn, Yn, Zn}
Table 2: Attribute point groups for different user face regions
User Face region Attribute point range
Left eye 0-7
Right eye 8-15
Left eyebrow 16-25
Right eyebrow 26-35
Nose 36-47
Mouth 48-67
Face contour 68-86
Left & Right iris+nose tip 87-89
Line above left eyebrow 90-94
Line above right eyebrow 95-99
Subsequently, varied clusters of these neurons are
fed to specific neurons in the second hidden layer. As
seen from figure 1, certain clusters of attribute points
(i.e. first layer neurons) represent specific parts of hu-
man face. These respective clusters can be referenced
from Table 2. Each of the second layer neurons are thus
tuned to learn individual patterns in specific face re-
gions, such as left/right eye, nose, mouth etc. respec-
tively.
This hidden layer space can be represented as,
H2 {H10, H11, H12, ..., H1n, ..., H19} (2)
where H1n {V0 V7, V8 V15, V16 V25, V26 V35,
V36 V47, V48 V67, V68 V86, V87 V89, V90 V94,
V95 V99}
Output layer consists of two neurons in case of bi-
nary classification task and a varying three to nine
neurons in case of Multiclass classification.The second
hidden layer is fully connected to each output neuron.
This enables output layer neurons to fully learn pat-
terns from each of the face regions present in H2 space.
Each neuron in the architecture has an individual bias
weight attached it.
6 Experimental setup
The further mentioned experimental results where com-
puted in Python environment. The deep neural network
was implemented and trained using the TensorFlow
TM
open source library. The data pre-processing was incor-
porated using Pandas
TM
open source library.
Table 3: Model hyper parameters with optimized values
Hyper Parameter Optimized value
Initial learning rate 0.01
learning rate decay ratio 0.9
Rate decay step 7000
Regularization beta 0.05
Epoch 750
7 Initialization and Hyper parameter tuning
The weights of entire network and their biases are ini-
tialized using Xavier Initialization method [11]. Empir-
ically, this initialization is found to perform better than
random initialization for this model, in turn helping the
cost function initialize closer to its global minimum.
The Hyper parameters of network are optimized based
upon performance comparision of different models hav-
ing varying hyper parameter values. The optimized val-
ues used for model training are as shown in Table 3.
‘Tanh’ activation function is preferred over other func-
tions owing to its better performance for this model.
‘Softmax with cross entropy’ function is used as activa-
tion for the output layer neurons.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Grammatical facial expression recognition using customized deep neural network architecture 5
* Standardized input data using Z score method.
Fig. 2: Customized Deep neural network architecture
8 Network training
The ‘Mean Squared Error’ function is used as the cost
function for this model, expressed as difference between
model predictions and its true output values. For train-
ing purpose, ‘Adam’ optimization algorithm [13] was
implemented owing to its faster convergence rate, be-
ing computationally efficient and been lesser dependent
on hyper parameter tuning. The model learning rate is
decreased exponentially with a decay ratio of 0.9 in ev-
ery 7000 iteration steps. This implementation helps the
cost function to reach its global minimum value quicker,
such that it does not overshoot and miss the minimum
when it is close to it, owing to larger initial learning
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
6 Devesh Walawalkar
rate. For increasing generalization capacity and to avoid
over fitting of model, l2-norm regularization is used in
form of a product term with hyper parameter ‘Regu-
larization Beta’, in order to control its contribution to
the cost function.
9 Multiclass classification
For training the network to distinguish between mul-
tiple markers, four models having different number of
markers to classify are implemented. For training and
testing the models, positive samples from each marker
class in the incorporated dataset are combined to form
separate data subsets. This aggregates to about 200 -
225 samples per marker class per user. These computed
models are as follows:
9.1 Three class classification
For this model, various three class combinations of the
available nine classes was implemented. Its architec-
ture modifications included three neurons in the out-
put layer, activated by ’softmax’ function. Rest of the
network hypertuning parameters were kept the same
as in case of binary classification. Marker combinations
are selected separately for User A and User B. Model
accuracy results are as shown in Table 4.
9.2 Five class classification
For this model, various possible five class combinations
of the available nine classes was implemented. Its ar-
chitecture modifications included five neurons in the
output layer, activated by ’softmax’ function. Network
hypertuning parameters were kept the same as in case of
binary classification. Marker combinations are selected
separately for User A and User B. Model accuracy re-
sults are as shown in Table 5.
9.3 Seven class classification
For this model, various possible seven class combina-
tions of the available nine classes was implemented.
Its architecture modifications included seven neurons
in the output layer, activated by ’softmax’ function.
Network hypertuning parameters were kept the same
as in case of binary classification. Marker combinations
are selected separately for User A and User B. Model
accuracy results are as shown in Table 6.
Table 4: Multiclass classification accuracy for Three classes
Percent Accuracy
Class combinations
a
for test set (%)
User A User B
A,YN,R 97.32 97.89
A,R,F 97.79 97.65
A,R,F 97.54 97.31
A,F,T 97.12 96.97
A,F,C 96.43 96.12
A,W,D 97.57 97.78
YN,R,N 98.42 98.10
N,D,T 97.15 97.23
Mean 97.42 97.38
a
A-Assertion,YN-Yes No questions,R-
Relative,F-Focus,T-Topic,C-
Conditional,D-doubt questions,W-Wh
questions,N-Negative
Table 5: Multiclass classification accuracy for Five classes
Percent Accuracy
Class combinations
a
for test set (%)
User A User B
A,YN,R,C,N 95.62 95.83
A,R,F,R,N 96.09 96.25
A,F,T,YN,C 96.23 96.47
A,F,C,R,N 95.51 95.19
A,W,D,R,YN 95.72 95.32
YN,R,N,A,W 95.41 95.83
Mean 95.76 95.82
a
A-Assertion,YN-Yes No questions,R-
Relative,F-Focus,T-Topic,C-
Conditional,D-doubt questions,W-Wh
questions,N-Negative
Table 6: Multiclass classification accuracy for Seven classes
Percent Accuracy
Class combinations
a
for test set (%)
User A User B
A,YN,R,C,N,D,W 95.13 95.36
A,R,F,R,N,D,W 95.17 95.63
A,F,T,YN,C,R,N 94.74 94.93
A,F,C,R,N,W,YN 94.92 94.83
A,T,D,R,YN,N,C 95.07 95.12
YN,R,N,A,W,T,C 95.10 95.19
Mean 95.02 95.18
a
A-Assertion,YN-Yes No questions,R-
Relative,F-Focus,T-Topic,C-
Conditional,D-doubt questions,W-Wh
questions,N-Negative
9.4 Nine class classification
For this model, all the nine possible classes were im-
plemented together. The model predicted the markers
with 95.11% accuracy for User A, 94.93% for User B
and 95.06% for User A and B together respectively.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Grammatical facial expression recognition using customized deep neural network architecture 7
Table 7: Binary classification test set accuracy for all nine classes in comparison with those of a generic fully connected network
Percent Accuracy for test set
Class type Proposed model (%) Fully connected network (%)
User A User B Both Users User A User B Both Users
Affirmative 98.27 98.60 98.37 78.32 77.17 73.83
Conditional 97.92 97.86 97.79 76.42 76.91 76.39
Relative 97.34 97.73 97.49 82.32 80.63 81.59
Negative 98.48 98.34 98.22 80.45 79.71 79.18
Wh Question 97.36 97.83 97.51 76.82 75.43 75.71
Yn Question 98.02 98.49 98.28 74.51 74.38 74.41
Doubt Question 98.47 98.36 98.11 78.52 78.64 78.01
Topics 97.71 97.59 97.46 81.98 81.27 80.19
Focus 98.76 98.25 98.33 75.65 75.17 74.93
Aggregate Mean 98.04 98.12 97.95 78.33 77.70 77.14
Overall Mean Accuracy 98.04 77.72
Table 8: Binary classification Performance comparison in terms of F score, Precision and Recall
Class (Freitas et al. [10])
a
This paper
Type F Score Precision Recall F Score Precision Recall
Assertion 0.89 0.98 0.90 0.98 0.97 0.98
Conditional 0.68 0.91 0.55 0.94 0.93 0.96
Relative 0.77 0.99 0.67 0.96 0.99 0.96
Negative 0.69 0.67 0.96 0.96 0.94 0.96
Wh Question 0.87 0.96 0.81 0.98 0.99 0.96
Yn Question 0.83 0.98 0.73 0.94 0.96 0.95
Doubt Question 0.89 0.87 0.94 0.99 0.97 0.98
Topics 0.90 0.95 0.85 0.98 0.98 0.97
Focus 0.91 0.94 0.89 0.99 0.98 0.98
a
Each F-score, precision and recall value considered is the maximum taken across
the four method variations in Freitas et al. [10]
10 Results
The accuracy of proposed model on all the markers in-
dividually as a binary classification task is shown in
Table 7. It also demonstrates accuracy comparison of
proposed model with that of a generic fully connected
network with exact same number of hidden, input and
output layer neurons, keeping all its model hyper pa-
rameters and the optimization algorithm identical. The
overall accuracy is calculated as mean of individual
class mean accuracies over the three variants (refer Ta-
ble 7), which results to 98.04% . As each marker test set
differs in number of samples, better accuracy represen-
tation in terms of F score, precision and recall is shown
in Table 8. These values are further compared with Fre-
itas et al. [10]. For Multiclass classification task, accu-
racies of three models implemented for each user can
be refer to in Table 4, 5 and 6.
11 Discussion
The F-score, Precision and Recall values obtained are
much improved as compared to those of accepted method
present in literature. This validates that a customized
network architecture as proposed here, is better capa-
ble of learning the correlation patterns in different face
region coordinates for a particular expression type. The
learning constraints provided to the hidden layer neu-
rons in form of the customized architecture, help the
model learn the patterns more accurately than a generic
fully connected one (Table 7). For Multiclass classifica-
tion, the model performs equally well and its accuracy
remains mostly constant over range of different number
of markers to classify.
Certain considerations include the fact that proposed
model was tested on a limited dataset. This leaves it
slightly untested over higher variance in certain attribute
coordinates. Also this method was tested on only two
users as available in the dataset. More users would have
GFEs of higher degrees of variance in terms of struc-
ture of their face and method of expressing a particular
GFE. A larger and more varied dataset can thus help to
better train and further validate the proposed model.
It also supports the notion of deep learning techniques
having better potential than standard machine learning
techniques in classifying tasks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
8 Devesh Walawalkar
12 Conclusion
The overall accuracy of proposed method is excellent,
such that it can reliably be used for classifying GFEs
captured in form of video frames. The model performs
equally well for both the Binary and Multiclass classifi-
cation tasks, demonstrating its ability to correctly dis-
tinguish between a GFE and a non-GFE and between
different types of GFEs. Further work on this topic
would involve combining this proposed model with the
current accepted methods of hand signs classification,
in order to create a much evolved human sign language
recognition system.
Acknowledgements I would like to specially thank Mr.
Fernando de Almeida Freitas (Freitas, F. A.), Mr. Felipe Ven-
ncio Barbosa (Barbosa, F. V.) and Mr. Sarajane Marques
Peres (Peres, S. M.) for creating the Grammatical Facial Ex-
pressions dataset and to ‘University of Sao Paulo’ for making
it available under public license. I would also like to mention
University of California, Irvine machine learning repository
for hosting and maintaining this dataset.
References
1. Anjo M D S, Pizzolato E B, Feuerstack S (2012, Novem-
ber) A real-time system to recognize static gestures of
brazilian sign language (libras) alphabet using kinect. In
Proceedings of the 11th Brazilian Symposium on Human
Factors in Computing Systems: 59-268. Brazilian Computer
Society.
2. Ari I, Uyar A, Akarun, L (2008, October) Facial feature
tracking and expression recognition for sign language. In
ISCIS’23rd International Symposium on Computer and In-
formation Sciences, 2008:1-6
3. Bastos I L, Angelo M F, Loula A C (2015, August) Recog-
nition of static gestures applied to Brazilian sign language
(Libras). In 28th SIBGRAPI Conference on Graphics, Pat-
terns and Images (SIBGRAPI), 2015: 305-312
4. Bedregal B C, Costa A C, Dimuro G P (2006, August)
Fuzzy rule-based hand gesture recognition. In IFIP Inter-
national Conference on Artificial Intelligence in Theory and
Practice Springer, Boston, MA : 285-294
5. Bridges B, Metzger, M (1996) Deaf tend your: Non-manual
Signals in American Sign Language. Calliope Press.
6. Brito L F (1995) Por uma gramtica de lnguas de sinais.
Tempo Brasileiro.
7. de Quadros R M, Karnopp L B (2009) Lngua de sinais
brasileira: estudos lingsticos. Artmed Editora.
8. de Souza C R, Pizzolato E B (2013, July) Sign language
recognition with support vector machines and hidden con-
ditional random fields: going from fingerspelling to natural
articulated words. In International Workshop on Machine
Learning and Data Mining in Pattern Recognition : 84-98
Springer Berlin Heidelberg
9. Dias D B, Madeo R C, Rocha T, Bscaro H H, Peres S M
(2009, June) Hand movement recognition for brazilian sign
language: a study using distance-based neural networks. In
IJCNN 2009. International Joint Conference on neural net-
works : 697-704
10. Freitas F A, Peres S M, Lima C A M, Barbosa F V (2014)
Grammatical Facial Expressions Recognition with Machine
Learning. In: 27th Florida Artificial Intelligence Research
Society Conference (FLAIRS), 2014, Pensacola Beach. Pro-
ceedings of the 27th Florida Artificial Intelligence Research
Society Conference (FLAIRS). Palo Alto: The AAAI Press:
180-185
11. Glorot X, Bengio Y (2010, March) Understanding the
difficulty of training deep feedforward neural networks. In
Proceedings of the Thirteenth International Conference on
Artificial Intelligence and Statistics :249-256
12. Kelly D, Reilly Delannoy J, Mc Donald J, Markham C
(2009, November) A framework for continuous multimodal
sign language recognition. In Proceedings of the 2009 inter-
national conference on Multimodal interfaces: 351-358
13. Kingma D, Ba J Adam (2014) A method for stochastic
optimization. arXiv preprint arXiv:1412.6980.
14. Krˇnoul Z, Hrz M, Campr P (2010, October) Correlation
analysis of facial features and sign gestures. In, IEEE 10th
International Conference on Signal Processing (ICSP),2010:
732-735
15. Lichman M (2013) UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of
California, School of Information and Computer Science.
16. Pistori H, Neto J (2004) An experiment on handshape
sign recognition using adaptive technology: Preliminary re-
sults. Advances in Artificial Intelligence — SBIA 2004: 763-
801
17. Pizzolato E B, dos Santos Anjo M, Pedroso G C (2010,
March) Automatic recognition of finger spelling for libras
based on a two-layer architecture. In Proceedings of the
2010 ACM Symposium on Applied Computing: 969-973
18. Porfirio A J, Wiggers K L, Oliveira L E, Weingaert-
ner D (2013, October) LIBRAS sign language hand con-
figuration recognition based on 3D meshes. In IEEE In-
ternational Conference on Systems, Man, and Cybernetics
(SMC), 2013: 1588-1593
19. Von Agris U, Knorr M, Kraiss K F (2008, September)
The significance of facial features for automatic sign lan-
guage recognition. In 8th IEEE International Conference
on Automatic Face & Gesture Recognition, 2008: 1-6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 1 Click here to
download Figure Fig1-
Figure 2 Click here to download Figure Fig2-eps-converted-
to.pdf
Manuscript latex file
Click here to access/download
Supplementary Material
template.aux
Manuscript latex file
Click here to access/download
Supplementary Material
template.blg
Manuscript latex file
Click here to access/download
Supplementary Material
template.log
Manuscript latex file
Click here to access/download
Supplementary Material
template.lot
Manuscript latex file
Click here to access/download
Supplementary Material
template.out
Manuscript latex file
Click here to access/download
Supplementary Material
template.toc
Manuscript latex file
Click here to access/download
Supplementary Material
template.tps